275 research outputs found

    Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

    Full text link
    The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space S\mathcal{S} and the action space A\mathcal{A} are both finite, to obtain a nearly optimal policy with sampling access to a generative model, the minimax optimal sample complexity scales linearly with ∣S∣×∣A∣|\mathcal{S}|\times|\mathcal{A}|, which can be prohibitively large when S\mathcal{S} or A\mathcal{A} is large. This paper considers a Markov decision process (MDP) that admits a set of state-action features, which can linearly express (or approximate) its probability transition kernel. We show that a model-based approach (resp. ~Q-learning) provably learns an ε\varepsilon-optimal policy (resp. ~Q-function) with high probability as soon as the sample size exceeds the order of K(1−γ)3ε2\frac{K}{(1-\gamma)^{3}\varepsilon^{2}} (resp. ~K(1−γ)4ε2\frac{K}{(1-\gamma)^{4}\varepsilon^{2}}), up to some logarithmic factor. Here KK is the feature dimension and γ∈(0,1)\gamma\in(0,1) is the discount factor of the MDP. Both sample complexity bounds are provably tight, and our result for the model-based approach matches the minimax lower bound. Our results show that for arbitrarily large-scale MDP, both the model-based approach and Q-learning are sample-efficient when KK is relatively small, and hence the title of this paper

    The Isotonic Mechanism for Exponential Family Estimation

    Full text link
    In 2023, the International Conference on Machine Learning (ICML) required authors with multiple submissions to rank their submissions based on perceived quality. In this paper, we aim to employ these author-specified rankings to enhance peer review in machine learning and artificial intelligence conferences by extending the Isotonic Mechanism (Su, 2021, 2022) to exponential family distributions. This mechanism generates adjusted scores closely align with the original scores while adhering to author-specified rankings. Despite its applicability to a broad spectrum of exponential family distributions, this mechanism's implementation does not necessitate knowledge of the specific distribution form. We demonstrate that an author is incentivized to provide accurate rankings when her utility takes the form of a convex additive function of the adjusted review scores. For a certain subclass of exponential family distributions, we prove that the author reports truthfully only if the question involves only pairwise comparisons between her submissions, thus indicating the optimality of ranking in truthful information elicitation. Lastly, we show that the adjusted scores improve dramatically the accuracy of the original scores and achieve nearly minimax optimality for estimating the true scores with statistical consistecy when true scores have bounded total variation

    Bridging Convex and Nonconvex Optimization in Robust PCA: Noise, Outliers, and Missing Data

    Full text link
    This paper delivers improved theoretical guarantees for the convex programming approach in low-rank matrix estimation, in the presence of (1) random noise, (2) gross sparse outliers, and (3) missing data. This problem, often dubbed as robust principal component analysis (robust PCA), finds applications in various domains. Despite the wide applicability of convex relaxation, the available statistical support (particularly the stability analysis vis-a-vis random noise) remains highly suboptimal, which we strengthen in this paper. When the unknown matrix is well-conditioned, incoherent, and of constant rank, we demonstrate that a principled convex program achieves near-optimal statistical accuracy, in terms of both the Euclidean loss and the ℓ∞\ell_{\infty} loss. All of this happens even when nearly a constant fraction of observations are corrupted by outliers with arbitrary magnitudes. The key analysis idea lies in bridging the convex program in use and an auxiliary nonconvex optimization algorithm, and hence the title of this paper

    Composition and predictive functional analysis of bacterial communities inhabiting Chinese Cordyceps insight into conserved core microbiome.

    Get PDF
    BACKGROUND: Over the past few decades, most attention to Chinese Cordyceps-associated endogenous microorganism was focused on the fungal community that creates critical bioactive components. Bacterial community associated with Chinese Cordyceps has been previously described; however, most studies were only presenting direct comparisons in the Chinese Cordyceps and its microenvironments. In the current study, our objectives were to reveal the bacterial community structure composition and predict their function. RESULTS: We collected samples of Chinese Cordyceps from five sites located in the Qinghai-Tibet Plateau and used a high throughput sequencing method to compare Chinese Cordyceps-associated bacterial community composition and diversity quantitatively across sites. The results indicated that for the Chinese Cordyceps-associated bacterial community there is no single core microbiome, which was dominated by the both Proteobacteria and Actinobacteria. Predictive functional profiling suggested a location specific function pattern for Chinese Cordyceps and bacteria in the external mycelial cortices involved in the biosynthesis of active constituents. CONCLUSIONS: This study is firstly used high throughput sequencing method to compare the bacterial communities inhabiting Chinese Cordyceps and its microhabitat and to reveal composition functional capabilities of the bacteria, which will accelerate the study of the functions of bacterial communities in the micro-ecological system of Chinese Cordyceps
    • …
    corecore